Install metacoder for data and plotting

devtools::install_github("grunwaldlab/metacoder") # do this before trying to knit
library(metacoder)
library(taxa)
## 
## Attaching package: 'taxa'
## The following objects are masked from 'package:metacoder':
## 
##     get_subtaxa, get_taxon_items
## The following object is masked from 'package:stats':
## 
##     binomial

Pepare data

file_path <- system.file("extdata", "unite_general_release.fasta", package = "metacoder")
sequences <- ape::read.FASTA(file_path)
unite_ex_data_3 <- extract_taxonomy(names(sequences),
                                    regex = "^(.*)\\|(.*)\\|(.*)\\|.*\\|(.*)$",
                                    key = c(name = "item_info", seq_id = "item_info",
                                            other_id = "item_info", "class_name"),
                                    database = "none")

Creating the class from scratch

Normally, users would not create the class from scratch, since this is rather complex. Instead, the class would be the output of functions like metacoder::extract_taxonomy. unite_ex_data_3 is an example of the current output format of metacoder::extract_taxonomy, but I would like to change its output to a classified object.

data <- classified(taxon_id = unite_ex_data_3$taxa$taxon_id,
                   parent_id = unite_ex_data_3$taxa$parent_id,
                   item_taxon_id = unite_ex_data_3$items$taxon_id,
                   taxon_data = data.frame(name =  unite_ex_data_3$taxa$name,
                                           rank = unite_ex_data_3$taxa$rank),
                   item_data = data.frame(seq_id = unite_ex_data_3$items$seq_id))

Acessing taxon data

Note how the item_count column is calculated each time it is needed. This is because the number of items per taxon can change when taxa are subsetted.

head(taxon_data(data))
##   taxon_id parent_id            name rank item_count
## 1        1      <NA>           Fungi    k        500
## 2        2         1      Ascomycota    p        180
## 3        3         2   Leotiomycetes    c         32
## 4        4         3      Helotiales    o         28
## 5        5         4 Hyaloscyphaceae    f         25
## 6        6         5         Lachnum    g         17

Acessing item data

head(item_data(data))
##   taxon_id   seq_id
## 1        7 JQ347180
## 2        9   U59145
## 3        7 AM084756
## 4        7 FM172814
## 5        7 FN539058
## 6       10 AB481260

Subsetting and plotting

I already made a plot method for classified in metacoder that calls metacoder::plot_taxonomy. I also added some non-standard argument evaluation for the subsetting and plotting methods to save users some typing. Columns in classified_object$taxon_data can be refereed to by name alone.

plot(data, 
     vertex_size = item_count,
     vertex_color = item_count,
     vertex_label = name)

Note how this:

plot(data[name == "Ascomycota", ],
     vertex_size = item_count,
     vertex_color = item_count,
     vertex_label = name)

… is the same as this:

data_subset <- data[data$taxon_data$name == "Ascomycota", ]
plot(data_subset,
     vertex_size = taxon_data(data_subset)$item_count,
     vertex_color = taxon_data(data_subset)$item_count,
     vertex_label = data_subset$taxon_data$name)

The [[ currently corresponds to a more restrictive type of subsetting, in which the parents and children of taxa are not automatically preserved.

plot(data[[item_count > 5]],
     vertex_size = item_count,
     vertex_color = item_count,
     vertex_label = name)

plot(data[[item_count < 200]],
     vertex_size = item_count,
     vertex_color = item_count,
     vertex_label = name,
     layout = "fruchterman-reingold")

However, it is still not working right as you might expect. I think this is because, when a child taxon is eliminated, its items are not transferred to the parent taxon. This is why some taxa in the above two plots have 0 items. Their children all together had > 5 items, but each child had < 5 items. I think the solution is to transfer the items of eliminated children to their non-eliminated parents, but I need to look into this more.